10- 



OPENGL / 
DIRECT3D API 



12^ 



METHOD 1 
▲ 



TRANSFORM / 
LIGHTING 



14' 



METHOD 2 METHOD 3 

t 

HOST CPU 
T 

HOST CPU 



HOST CPU 



RASTERIZATION 



16 



FRAGMENT 
OPERATIONS 



HARDWARE 



HARDWARE 



FIG. 1 



R0 



GO 



R2 
80 



G2 



BO 



B2 



AO 



A2 



R1 



R3 



G1 



G3 



B1 



B3 



A1 



A3 



7" 



R4 


G4 


B4 


A4 


R5 


G5 


B5 


A5 


R6 


G6 


B6 


A6 


R7 


G7 


B7 


A7 




82^-j 



RO 


R1 




R6 


R7 




GO 


G1 




G6 


G7 










BO 


B1 




B6 


B7 




AO 


A1 




A6 


A7 




84- 



R 


G 


B 


A 



FIG. 4 



s 



O 



1 



20 



18' 



HOST CPU 



3 



14- 



1 



16 



RASTERIZATION ^22 



RASTERIZATION UNIT 



FRAGMENT 
OPERATIONS 



v/ 



19 



Z-BUFFER UNIT K^24 



TEXTURE ADDRESS UNIT 



26 



TEXTURE CACHE 



28 



TEXTURE FILTERING 
UNIT 



BLENDING UNIT 



-30 



32 



MEMORY CONTROLLER 



MEMORY 



36 



CONFIG 
REGS 



34 



FIG. 2 
(PRIOR ART) 



18' 



HOST CPU 



40 



57- 



ENHANCED 
RASTERIZATION UNIT 



VECTOR 
INPUT 
UNIT 



52 



ENHANCED 
Z-BUFFER UNIT 



54 



ENHANCED 
TEXTURE ADDRESS UNIT 



" [LB] ENHANCED 
TEXTURE CACHE 



64 



62^ 



INSTRUCTION 
CACHE 



•56 



1^58 



'59 



1^50 



CONFIG 
REGS 



42 



T 



60 



VECT OR PROCESSING ENGINE 



VECTOR FUNCTION UNIT 



72- 



BLENDING 
UNIT 



REGISTER 
FILE 



OUTPUT 
BUFFER 



if 



74 



WRITE BUFFER 



-76 
VECTOR 

OUTPUT 

UNIT 



70 



MEMORY CONTROLLER 
I — 



FIG. 3 



MEMORY 



-36 



38 



* 

LOAD TWO 2X2 32-BIT DATA 
FROM 

ENHANCED TEXTURE CACHE 



^90 



SEPARATE DATA INTO 4 PARTITIONS OF 
RED, GREEN, BLUE, & ALPHA 
(8 ELEMENTS @ 8 BITS EACH) 



REARRANGE PARTITIONS INTO 
FOUR 8-BIT DATA FOR EACH COLOR 



PERFORM 4-PARTITIONED INNER PRODUCT 
(EIGHT 8-BIT ELEMENTS IN EACH PARTITION) 



(jDONTINUE^ 



94 



FIG. 5 



RO 


R1 


R2 


R3 


R4 


R5 


R6 


R7 




GO 


G1 


G2 


G3 


G4 


G5 


G6 


G7 


































BO 


B1 


B2 


B3 


B4 


B5 


B6 


B7 




AO 


A1 


A2 


A3 


A4 


A5 


A6 


A7 



100- 



KRO 


KR1 


KR2 


KR3 


KR4 


KR5 


KR6 


KR7 




KGO 


KG1 


KG2 


KG3 


KG4 


KG 5 


KG 6 


KG 7 


































KBO 


KB1 


KB2 


KB3 


KB4 


KB5 


KB6 


KB7 




KAO 


KA1 


KA2 


KA3 


KA4 


KA5 


KA6 


KA7 



102^ 




START ^ 



READ SOURCE DATA 
(E.G., 256 BIT) 



J^110 



SEPARATE SOURCE DATA INTO 4 PARTITIONS OF 
RED, GREEN, BLUE, & ALPHA 
(E.G., 8 ELEMENTS @ 8 BITS EACH) 



112 



READ COEFFICIENT DATA 
(E.G., 512 BIT - 16 BIT X 32) 



114 



SEPARATE COEFFICIENT DATA INTO 4 PARTITIONS OF 
RED, GREEN, BLUE, & ALPHA 
(E.G., 8 ELEMENTS @ 16 BITS EACH) 



116 



SUM = 0 



NO 



-118 



MULTIPLY SOURCE DATA BY 
COEFFICIENT DATA 
(E.G., 8-BIT ELEMENT BY 16-BIT ELEMENT) 



-120 



ADD RESULT TO SUM 



YES 




RIGHT SHIFT SUM 



FIG. 7 



CONCATENATE WITH 
PREVIOUS RESULT 



126 



128 




CI 



18- 



HOST CPU 



42 



VECTOR 
INPUT 
UNIT 1 



INSTRUCTION 
CACHE 



64 



T 



62a- 



50a 



VECTOR 
INPUT 
UNIT 2 



50b 



134 



VFU1 



66a _X 



RF1 



VPE 



66b 



VFU2 



VECTOR 
OUTPUT 
UNIT 1 



70a 



J 



70b 



i 



1 



RF2 



VECTOR 
OUTPUT 
UNIT 2 



MEMORY CONTROLLER 



FIG. 8 



CONFIG 
REGS 



c c < 



MEMORY 



J^36 



138 









HOST CPU PROVIDES 
BASE ADDRESSES, WIDTHS, STRIDES, 
COEFFICIENTS FTP 

TO POMFIf^l IRATIHM DCPICTCDC 


x^140 


v 




COMPONENTS REFERENCE 
CONFIGURATION RFOI^TFRQ 


^142 


* 




COMPONENTS SET TO PERFORM TYPE OF 
OPERATION INDICATED BY 
CONFIGURATION REGISTERS 


^144 


+ 




COMPONENTS PERFORM TYPE OF 
OPERATION INDICATED BY 
CONFIGURATION REGISTERS 


^146 









FIG. 9 



ENHANCED RASTERIZATION UNIT GENERATES 
32 (U, V) SOURCE COORDINATES FOR IMAGE 0 AND 
A COMPRESSED REPRESENTATION OF 
32 (X, Y) OUTPUT COORDINATES 



ENHANCED RASTERIZATION UNIT PASSES 
THE 32 (U, V) COORDINATES FOR IMAGE 0 AND 
THE COMPRESSED 32 (X, Y) OUTPUT COORDINATES 
TO ENHANCED Z-BUFFER 



ENHANCED RASTERIZATION UNIT GENERATES 
32 (U, V) COORDINATES FOR IMAGE 1 



ENHANCED Z-BUFFER PASSES 
THE 32 (U, V) COORDINATES FOR IMAGE 0 AND 
THE COMPRESSED 32 (X, Y) OUTPUT COORDINATES 
TO ENHANCED TEXTURE ADDRESS UNIT 

I 



ENHANCED TEXTURE ADDRESS UNIT CONVERTS 
THE 32 (U, V) COORDINATES FOR IMAGE 0 INTO 
32 ADDRESSES STORING IMAGE 0 SOURCE DATA 



ENHANCED RASTERIZATION UNIT PASSES 
THE 32 (U, V) COORDINATES FOR IMAGE 1 
TO ENHANCED Z-BUFFER 



FIG. 10A 0 



ENHANCED TEXTURE ADDRESS UNIT PASSES 
THE 32 ADDRESSES STORING IMAGE 0 SOURCE DATA 
AND THE COMPRESSED OUTPUT COORDINATES 
TO ENHANCED TEXTURE CACHE 



174 




ENHANCED TEXTURE CACHE RETRIEVES 
IMAGE 0 SOURCE DATA FROM THE 
32 ADDRESSES IN MEMORY AND PUTS THE 
DATA INTO AN INTERNAL DATA CACHE AREA 



ENHANCED TEXTURE CACHE LOADS 
IMAGE 0 SOURCE DATA FROM 
INTERNAL DATA CACHE AREA TO 
AN INTERNAL LINE BUFFER 



180 



ENHANCED TEXTURE CACHE PASSES 
IMAGE 0 SOURCE DATA TO A QUEUE 0 OF VFU 
AND THE COMPRESSED OUTPUT 
COORDINATES TO A REGISTER OF VFU 



ENHANCED Z-BUFFER PASSES THE 
32 (U, V) COORDINATES FOR IMAGE 1 TO 
ENHANCED TEXTURE ADDRESS UNIT 



T 



ENHANCED TEXTURE ADDRESS UNIT 
CONVERTS THE 32 (U, V) COORDINATES 
FOR IMAGE 1 INTO 
32 ADDRESSES AT WHICH 
IMAGE 1 SOURCE DATA ARE STORED 



--^ 186 

1-0 



FIG. 10B 



ENHANCED TEXTURE ADDRESS UNIT 
PASSES THE 32 ADDRESSES AT WHICH 
IMAGE 1 SOURCE DATA ARE STORED TO 
ENHANCED TEXTURE CACHE 



ENHANCED TEXTURE CACHE RETRIEVES 
IMAGE 1 SOURCE DATA FROM THE 
32 ADDRESSES IN MEMORY AND PUTS THE 
DATA INTO AN INTERNAL DATA CACHE AREA 



ENHANCED TEXTURE CACHE LOADS 
IMAGE 1 SOURCE DATA FROM 
INTERNAL DATA CACHE AREA TO 
AN INTERNAL LINE BUFFER 



188 




YES 



194 



ENHANCED TEXTURE CACHE PASSES 
IMAGE 1 SOURCE DATA TO 
A QUEUE 1 OF VFU 



196 



VFU ADDS EACH 8-BIT PARTITION OF 
QUEUE 0 AND QUEUE 1 TOGETHER AND RIGHT 
SHIFTS THE SUM BY 1 TO COMPUTE AVERAGE 



^198 



VFU PASSES 
THE RESULTING AVERAGE VALUES AND 
THE COMPRESSED OUTPUT COORDINATES 
TO OUTPUT BUFFER 



,200 



-0 



FIG. 10C 



0. 



OUTPUT BUFFER 
CONCATENATES THE AVERAGE VALUES INTO 256-BITS 

AND 

GENERATES A COMPRESSED OUTPUT ADDRESS & MASK 



^202 



OUTPUT BUFFER PASSES 
THE CONCATENATED VALUES AND 
THE COMPRESSED OUTPUT ADDRESS & MASK 
TO WRITE BUFFER 



v^204 



NO 



NO 



WRITE BUFFER STORES 
THE CONCATENATED VALUES AND 
THE COMPRESSED OUTPUT ADDRESS & MASK 



^206 




208 



WRITE BUFFER BURSTS 
THE CONCATENATED VALUES TO MEMORY AT 
THE OUTPUT ADDRESSES 



FIG. 10D 




212 



YES 



-210 



START 



3 



ENHANCED RASTERIZATION UNIT GENERATES 
16 (U, V) COORDINATES FOR FRAME AND 
A COMPRESSED REPRESENTATION OF 
16 (X, Y) OUTPUT COORDINATES 



x^222 



ENHANCED RASTERIZATION UNIT PASSES 
THE 16 (U, V) COORDINATES FOR FRAME AND 
THE COMPRESSED 16 (X, Y) OUTPUT COORDINATES 
TO ENHANCED Z-BUFFER 



224 



ENHANCED RASTERIZATION UNIT GENERATES 
16 (U, V) COORDINATES FOR IDCT 



226 



ENHANCED Z-BUFFER PASSES 
FIRST 8 OF THE 16 (U, V) COORDINATES FOR FRAME AND 
THE COMPRESSED 16 (X, Y) OUTPUT COORDINATES 
TO ENHANCED TEXTURE ADDRESS UNIT 



~4-^ 230a 



232 



ENHANCED TEXTURE ADDRESS UNIT (ETAU) 
CONVERTS FIRST 8 (U, V) COORDINATES FOR FRAME 

INTO 

A FIRST SET OF 32 ADDRESSES STORING FRAME DATA 



234 



© 

FIG. 11A 



ETAU PASSES 
THE FIRST SET OF 32 ADDRESSES STORING FRAME 

DATA 

AND THE COMPRESSED OUTPUT COORDINATES 
TO ENHANCED TEXTURE CACHE 



.236 




240a 



ENHANCED TEXTURE CACHE RETRIEVES 

A FIRST SET OF FRAME DATA 
FROM THE FIRST SET OF 32 ADDRESSES 
IN MEMORY AND PUTS THE DATA INTO 
AN INTERNAL DATA CACHE AREA 



YES 



ENHANCED TEXTURE CACHE LOADS 
THE FIRST SET OF FRAME DATA FROM 
INTERNAL DATA CACHE AREA TO 
AN INTERNAL LINE BUFFER 



^246 



ENHANCED Z-BUFFER PASSES 
SECOND 8 OF THE 16 (U, V) COORDINATES FOR 

FRAME 

TO ENHANCED TEXTURE ADDRESS UNIT 

& 

ETAU CONVERTS 
SECOND 8 (U, V) COORDINATES FOR FRAME INTO 
A SECOND SET OF 32 ADDRESSES STORING 
FRAME DATA 



,230b 



ENHANCED RASTERIZATION UNIT PASSES 1^252 
THE 16 (U, V) COORDINATES FOR IDCT 
TO ENHANCED Z-BUFFER 



FIG. 11B 



©1- 



ENHANCED TEXTURE CACHE PASSES 
THE FIRST SET OF FRAME DATA 
TO A QUEUE 0 OF VFU AND 
THE COMPRESSED OUTPUT COORDINATES 
TO A REGISTER OF VFU 



254 



ETAU PASSES THE SECOND SET OF 32 
ADDRESSES STORING FRAME DATA 
TO ENHANCED TEXTURE CACHE 



256 



ENHANCED TEXTURE CACHE 
RETRIEVES AND LOADS 
THE SECOND SET OF FRAME DATA 
FROM THE SECOND SET OF 32 ADDRESSES 
IN MEMORY 



240b 



ENHANCED Z-BUFFER PASSES 
16 (U, V) COORDINATES FOR IDCT 
TO ETAU 
& 

ETAU CONVERTS THE 16 (U, V) COORDINATES INTO 
32 ADDRESSES STORING THE IDCT DATA 



-230c 



X 



VFU ACCESSES 
INTERPOLATION COEFFICIENTS FROM 
ENHANCED TEXTURE ADDRESS UNIT 



258 



FIG. 11C 



VFU COMPUTES PARTITIONED INNER PRODUCT 
WITH THE FIRST SET OF FRAME DATA IN QUEUE 0 
AND INTERPOLATION COEFFICIENTS 



VFU STORES THE FIRST 64-BIT INNER 
PRODUCT VALUE IN A REGISTER OF THE 
REGISTER FILE 

I 



ENHANCED TEXTURE CACHE PASSES 
THE SECOND SET OF FRAME DATA 
TO QUEUE 0 OF VFU 



260a 



262a 



-4^ 



264 



VFU COMPUTES PARTITIONED INNER PRODUCT 
WITH THE SECOND SET OF FRAME DATA IN QUEUE 

0 

AND INTERPOLATION COEFFICIENTS 



'260b 



ETAU PASSES 
THE 32 ADDRESSES STORING IDCT DATA 
TO ENHANCED TEXTURE CACHE 



.266 



J 



VFU APPENDS THE SECOND 64-BIT INNER PRODUCT 
VALUE TO THE REGISTER OF THE REGISTER FILE 



-■4- -262b 



ENHANCED TEXTURE CACHE 
RETRIEVES AND LOADS THE IDCT DATA 
FROM THE 32 ADDRESSES IN MEMORY 



240c 



FIG. 11D 



"SB? 

El 
w 



ENHANCED TEXTURE CACHE PASSES 
THE IDCT DATA TO A QUEUE 1 OF VFU 



VFU COMPUTES A PARTITIONED ADD WITH THE 
INNER PRODUCT VALUES AND 

THE IDCT DATA IN QUEUE 1 
TO PRODUCE A 128-BIT RESULT 



VFU PASSES RESULT TO OUTPUT BUFFER 



FIG. HE 



268 



270 



272 



SI 



s Li 



OUTPUT BUFFER GENERATES A COMPRESSED 
OUTPUT ADDRESS & MASK 



X 



OUTPUT BUFFER PASSES RESULT AND 
THE COMPRESSED OUTPUT ADDRESS & MASK 
TO WRITE BUFFER 



x^282 



NO 



WRITE BUFFER STORES 
RESULT AND 
THE COMPRESSED OUTPUT ADDRESS & MASK 



'284 



NO 




YES 



286 



WRITE BUFFER BURSTS 
RESULTS TO MEMORY AT 
OUTPUT ADDRESSES 



^288 



FIG. 11F 




290 



